Approximating Hierarchy-Based Similarity for WordNet Nominal Synsets using Topic Signatures

نویسندگان

  • Eneko Agirre
  • Enrique Alfonseca
  • Oier Lopez de Lacalle
چکیده

Topic signatures are context vectors built for concepts. They can be automatically acquired for any concept hierarchy using simple methods. This paper explores the correlation between a distributional-based semantic similarity based on topic signatures and several hierarchy-based similarities. We show that topic signatures can be used to approximate link distance in WordNet (0.88 correlation), which allows for various applications, e.g. classifying new concepts in existing hierarchies. We have evaluated two methods for building topic sigantures (monosemous relatives vs. all relatives) and explore a number of different parameters for both methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Publicly Available Topic Signatures for all WordNet Nominal Senses

Topic signatures are context vectors built for word senses and concepts. They can be automatically acquired from the web for any concept hierarchy using the “monosemous relative” method. Topic signatures have been shown to be useful in Word Sense Disambiguation, for modeling similarity between word senses, classifying new terms in hierarchies and also building hierarchical clusters of word sens...

متن کامل

Chinese-English Bilingual Word Semantic Similarity Based on Chinese WordNet

Semantic similarity measurement of multilingual words is a challenging problem in data mining, information extraction, information retrieval, etc. This paper introduces an algorithm to measure the semantic similarity of Chinese-English bilingual words based on Chinese WordNet, an expansion of WordNet in Simplified Chinese. The algorithm not only measures the semantic similarity for Chinese and ...

متن کامل

Corpus-based Semantic Relatedness for the Construction of Polish WordNet

The construction of a wordnet, a labour-intensive enterprise, can be significantly assisted by automatic grouping of lexical material and discovery of lexical semantic relations. The objective is to ensure high quality of automatically acquired results before they are presented for lexicographers’ approval. We discuss a software tool that suggests synset members using a measure of semantic rela...

متن کامل

Highlighting relevant concepts from Topic Signatures

This paper presents deepKnowNet, a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method applies a knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate WordNet sense to large sets of topically related words acquired from the web, named TSWEB. This Word Sense Disambiguation algorithm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004